Model Selection

Multilingual OCR

# Multilingual OCR

PP OCRv4 Mobile Det

PP-OCRv4_mobile_det is an efficient text detection model optimized for mobile devices developed by the PaddleOCR team, suitable for deployment on edge devices.

Text Recognition Supports Multiple Languages

PP OCRv5 Mobile Rec

PP-OCRv5_mobile_rec is the latest generation of text line recognition model developed by the PaddleOCR team. It supports the recognition of four languages: Simplified Chinese, Traditional Chinese, English, and Japanese, and is suitable for various complex text scenarios.

Text Recognition Supports Multiple Languages

PP OCRv5 Server Rec

PP-OCRv5_server_rec is the latest generation of text line recognition model developed by the PaddleOCR team, supporting the recognition of multilingual and complex text scenarios.

Text Recognition Supports Multiple Languages

Florence Base Mixed Line Bbox Ocr

An image-to-text model fine-tuned based on Microsoft Florence-2 foundation model, supporting Swedish and English, specializing in historical handwritten text recognition and optical character recognition.

Mistral Small 1

An image text-to-text model built on Mistral-Small-3.1-24B-Instruct-2503, supporting multilingual processing

Safetensors Supports Multiple Languages

CreitinGameplays

Internvl3 2B AWQ

InternVL3-2B is an advanced Multimodal Large Language Model (MLLM) developed by OpenGVLab, featuring exceptional multimodal perception and reasoning capabilities, supporting tool usage, GUI agents, industrial image analysis, 3D visual perception, and more.

Transformers Other

Paligemma2 3b Mix 224 Jax

PaliGemma 2 is an upgraded vision-language model based on Gemma 2, supporting multilingual image-text input and text output, specifically designed for vision-language tasks

Minicpm O 2 6 Int4

The int4 quantized version of MiniCPM-o 2.6, significantly reducing GPU VRAM usage while supporting multimodal processing capabilities.

Transformers Other

Paligemma2 28b Mix 224

PaliGemma 2 is an upgraded vision-language model launched by Google, combining the capabilities of Gemma 2 and SigLIP vision models, supporting multilingual image-text interaction tasks.

Paligemma2 28b Mix 448

PaliGemma 2 is a vision-language model based on Gemma 2, supporting image+text input and text output, suitable for various vision-language tasks.

Paligemma2 10b Mix 224

PaliGemma 2 is a vision-language model based on Gemma 2, supporting image and text input to generate text output, suitable for various vision-language tasks.

Paligemma2 3b Mix 448

PaliGemma 2 is a vision-language model based on Gemma 2, supporting image and text inputs with text generation output, suitable for various vision-language tasks.

A Devanagari optical character recognition model based on the TrOCR architecture, specifically fine-tuned for Nepali/Devanagari script

Text Recognition

Transformers Other

A Thai and English optical character recognition model fine-tuned from the TrOCR base handwriting model, excelling in processing handwritten text line images

Text Recognition

Transformers Supports Multiple Languages

This model is specifically trained for Urdu OCR tasks and is most suitable for processing single-line Urdu text images, primarily focusing on printed text.

Text Recognition

Transformers Other

Trocr Medieval Cursiva

This is a TrOCR-based medieval cursive script recognition model, specifically designed for identifying handwritten texts in Latin, French, Italian, Spanish, and Catalan from the medieval period.

Text Recognition

Transformers Supports Multiple Languages

TrOCR-Ru is an optical character recognition model fine-tuned on synthetic datasets of Russian and English, based on microsoft/trocr-base-handwritten, focusing on image-to-text tasks.

Text Recognition

Transformers Supports Multiple Languages

Trocr Base Finetune Numbers

TrOCR is a Transformer-based optical character recognition model designed to extract text content from images.

Transformers English

An OCR system based on Transformer architecture, specifically designed for recognizing Central Kurdish text, trained using synthetic data.

Text Recognition

Pix2struct Ocrvqa Base

Pix2Struct is a visual question answering model fine-tuned for OCR-VQA tasks, capable of parsing textual content in images and answering questions

Transformers Supports Multiple Languages

Pix2struct Docvqa Base

Pix2Struct is an image encoder-text decoder model trained on image-text pairs, supporting various tasks including image captioning and visual question answering.

Transformers Supports Multiple Languages

Pix2struct Chartqa Base

Pix2Struct is an image encoder-text decoder model trained on image-text pairs for multitasking, specifically fine-tuned for chart question answering tasks

Transformers Supports Multiple Languages

Donut Base Finetuned Latvian Receipts

This model is a fine-tuned version of donut-base on a Latvian receipt dataset, primarily used for receipt image processing tasks

Text Recognition

Doctr Torch Crnn Mobilenet V3 Large French

An optical character recognition (OCR) model based on TensorFlow 2 and PyTorch, supporting multilingual text detection and recognition

Text Recognition

Transformers Supports Multiple Languages

Doctr Tf Crnn Vgg16 Bn French

Optical Character Recognition technology based on TensorFlow 2 and PyTorch, supporting multilingual document recognition

Text Recognition

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase